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Abstract 

The solvation model proposed by Fattebert and Gygi and Scherlis et al. f3| is reformulated, 
overcoming some of the numerical limitations encountered and extending its range of applicability. 
We first recast the problem in terms of induced polarization charges that act as a direct mapping 
of the self-consistent continuum dielectric; this allows to define a functional form for the dielectric 
that is well behaved both in the high-density region of the nuclear charges and in the low-density 
region where the electronic wavefunctions decay into the solvent. Second, we outline an iterative 
procedure to solve the Poisson equation for the quantum fragment embedded in the solvent that 
does not require multi-grid algorithms, is trivially parallel, and can be applied to any Bravais 
crystallographic system. Last, we capture some of the non-electrostatic or cavitation terms via 
a combined use of the quantum volume and quantum surface [3] of the solute. The resulting 
self-consistent continuum solvation (SCCS) model provides a very effective and compact fit of 
computational and experimental data, whereby the static dielectric constant of the solvent and one 
parameter allow to fit the electrostatic energy provided by the PCM model with a mean absolute 
error of 0.3 kcal/mol on a set of 240 neutral solutes. Two parameters allow to fit experimental 
solvation energies on the same set with a mean absolute error of 1.3 kcal/mol. A detailed analysis 
of these results, broken down along different classes of chemical compounds, shows that several 
classes of organic compounds display very high accuracy, with solvation energies in error of 0.3-0.4 
kcal/mol, whereby larger discrepancies are mostly limited to self-dissociating species and strong 
hydrogen-bond forming compounds. 
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I. INTRODUCTION 



Continuum solvation models have proved to be very effective in capturing the complexity 
of a solvent in an implicit fashion Q-B|. Even if one could argue that an explicit description 
of the solvent would be a more faithful representation of the real system, an explicit solvent 
would require extensive molecular dynamics (MD) simulations in order to obtain meaningful 
thermodynamic averages. In addition, even if these were computationally feasible, an ab- 
initio density-functional theory (DFT) description of the solvent would rarely provide the 
correct results, due to the limitations in the accuracy of the functionals adopted. Critical 
limitations in the representation of the structure of liquid water with DFT are well known 
in the literature j8-12|, and can be related to the lack of a proper description of van der 
Waals interactions and hydrogen bonds in standard DFT. Moreover, given the presence of 
hydrogen atoms in liquid water, neglecting the quantum motion of the nuclei of the sys- 
tem could significantly affect the computed physical properties of the system 



13- 



The 



above limitations in the DFT description of liquid water translate into phase diagrams that 
differ from the experimental one, and melting temperatures that are much larger than the 
experimental temperature 
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12| . Being unable to accurately describe the structure and 



the dynamics of liquid water, it is questionable whether the most popular DFT functionals 
could correctly reproduce its dielectric behavior. As a typical dipolar liquid, the dielectric 
properties of liquid water derive from the electric dipole moment carried by the individ- 
ual molecules. Even though much effort has been put in the characterization of the dipole 
moment of water in its liquid state, a consensus on the subject is still lacking 16l-|l8|. More- 
over, it was shown in the literature [l^ that the dielectric response of water is dominated 
by the short range effects of the hydrogen bond environment, thus implying that the lack 
of accuracy in the description of intermolecular interactions in liquid water could lead to a 
poor characterization of its dielectric properties. 

_developed jj-?!, espe- 



19|. Within these many 



Many continuum solvation models have been proposed and wide 
daily in the chemistry literature, since the earliest work of Onsager 
approaches, one of the most popular is the Polarizable Continuum Model (PCM) of Tomasi 

n Q 

and coworkers 17| that, in its latest formulation in terms of Integral Equations (lEF) 
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2l| . encompasses a wide range of similar methods (e.g. the COSMO approach (22[). 



Being mostly linked to the chemistry community, PCM has not been used in condensed mat- 



ter and solid state simulations. In particular, the possibility to deal with metallic systems 
within PCM was introduced only later in an implicit way j23| . and the algorithm was not 
really developed to be interfaced with periodic systems and ab-initio MD simulations. 

In an effort to extend solvation methods to plane-wave, periodic-boundary codes and ab- 
initio MD, Fattebert and Gygi proposed a new model of continuum solvation 2^, where 
the dielectric is defined as a smooth self-consistent function of the electronic density of the 
solute. This model was further extended by Scherlis et al. to include the calculation of the 
cavitation energy, by defining it in terms of the quantum surface of the solute sl. Despite 



the simple and elegant formulation of the model, some ill-conditioning of the problem in 



its original formulation led to abandoning or relaxing self-consistency 



25|: models in which 



the dielectric is defined in terms of a fictitious, atom-centered electronic density have been 



proposed 



26] . together with approaches were the electronic density of the solute precom- 



le reported 
m) rely 



puted in vacuum is adopted to define the continuum solvent. In addition, all of t 
implementations and derivations of the Fattebert and Gygi method (e.g. Refs. 
on multigrid solvers, which entail high-order discretization of the Laplacian operator and 
that typically require Cartesian geometries and serial implementations. 

In the present work, starting from the model of Fattebert and Gygi, a revised self- 
consistent continuum solvation (SCCS) model is derived, in which the challenges outlined 

n 

above are tackled as follows. First, along the lines of PCM [7], the method is reimplemented 
in terms of polarization charges and solved using an iterative approach that is intrinsically 
parallel, extendable to any kind of Bravais lattice, and straightforwardly transferable into 
an y p lane- wave code. Second, some of the numerical instabilities of the original formulation 







ll, 12J| are solved by properly redefining the relation between the dielectric and the electronic 
density of the solute. Third, the model features a revised version of the extension of Scherlis 
et . .eat tKe e™tat,o„ eot.t„b.t.t. to .o,vat,o„ ftee et.e.,e. wKete t.e eo„eept. 

of quantum surfaces and quantum volumes jsl have been extended to model dispersion and 
repulsion effects in a simplified way. 

The accuracy of the proposed method has been tested on a reference sample of 240 neutral 
solutes in water. Its ability to accurately reproduce with only one fitting parameter PCM 
results provides a strong validation of the method and allows its extension to reproduce 
experimental solvation energies. The resulting agreement with experiments has a mean 
absolute error of 1.3 kcal/mol when two fitting parameters are used. Moreover, an analysis 



of the error as a function of the functional groups of the solutes provides some critical insights 
into potential limitations of the model. In particular, results for several classes of organic 
compounds present very high accuracies (with errors on the order of 0.3-0.4 kcal/mol), 
and discrepancies are mostly limited to chemical compounds that undergo dissociation in 
water, such as carboxylic acids and amines. Significant errors are also found for some strong 
hydrogen-bond forming species, such as ethers, alcohols and fiuorinated compounds. These 
findings suggest that an extension of the model that takes into account self-dissociation or 
a shell of explicit water molecules around the solute could provide results in quantitative 
agreement with experiment. 

The paper is organized as follows. In Section |TT] the main features of previous continuum 
solvation methods are reviewed. In Section IIIII the basic equations of the electrostatic part 
of the proposed method are presented. In Section [IV] some limitations of previous mod- 
els are discussed and a new definition of the dielectric function that solves most of these 
limitations is presented. In Section |V] a new numerical approach is discussed, based on an 
iterative solution of the dielectric problem, alternative of and simpler than multigrid solvers. 
In Section IVIl complementary non-electrostatic contributions to solvation are introduced. 
Eventually, in Section IVIII parametrizations of the model and a comparison with theoretical 
results from similar well assessed techniques are reported. 



II. PREVIOUS APPROACHES 

The most widespread continuum solvation method is the Polarizable Continuum Model 
by Tomasi and coworkers fzl, that in its formulation in terms of integral equations j2ol . 21 | 



represents a most general approach to continuum solvation. The basic physical picture 
behind the model is one of a solute contained in an ad-hoc cavity surrounded by a continuous 
polarizable dielectric, whose response to the solute charge distribution is fully characterized 
by the value of its static dielectric constant eo. In this model, the transition between the 
vacuum region inside the solute cavity and the surrounding dielectric continuum is sharp and 
discontinuous. This discontinuity allows to treat the effect of the surrounding environment 
on the solute through introducing a polarization charge density that is exactly localized at 
the vacuum-dielectric interface. In addition to homogeneous isotropic dielectrics, a suitable 
surface charge density can be defined via integral equations also in complex embedding 



environments, that include multiple interfaces and metallic systems. Thus, by representing 
the response of the environment in terms of an effective surface polarization density, lEF- 
PCM reduces a three-dimensional problem into a two-dimensional one. Numerically, lEF- 
PCM adopts a Boundary Element Method (BEM) to discretize the surface of the molecular 



cavity and the operators defined on the surface domain 
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28| . The final ingredient is 



;he definition of the molecular cavity: in this respect, different choices have been adopted 



29|, the most popular one being a rigid cavity built as the superposition of atom-centered 



spheres with fixed radii, corresponding to empirical van der Waals atomic radii multiplied by 
a solvent-dependent scaling factor. This choice allows to have a regular discretization of the 
cavity surface that helps numerical convergence (see Refs. 29|,|30| and references therein). On 
the other hand, the numerical discretization of the cavity surfaces has lead, in the original 
formulations, to atomic forces that were not continuous with respect to atomic positions and 
suffered from numerical singularities. For this reason it has been difficult to extend PCM to 
ab-initio MD simulations. In order to solve such a problem, modified versions of continuum 



models have been proposed in the literature 



31 



32|. Moreover, it is worth noting that more 



advanced definitions of the molecular cavity have been proposed in terms of an isodensity 
of the electronic density of the solute. Isosurfaces of both the frozen electronic density of 



the solute (Isodensity PCM |33[) and the self-consistent density (SCI-PCM [34]) have been 
considered. Nonetheless, few applications of the above methods have been reported in the 
literature, probably because of the absence of analytic gradients. 

In order to extend continuum solvation to ab-initio MD simulations, a novel approach has 
been proposed by Fattebert and Gygi . The main difference with respect to PCM is 

in the definition of a dielectric function e (r) = e ^p'^^^^ (r)) , defined in terms of the electronic 
density p^'^'^ of the solute, and smoothly varying between 1, when p'^'*^'^ is large, to cq when 
psiec g rjj^g physical problem is expressed in terms of the electrostatic response generated 
by the embedding dielectric and acting on the solute. This response field, which is defined 
in the whole three-dimensional space, is then obtained numerically by using a multigrid 
solver. The method of Fattebert and Gygi requires - as it is argued here - a careful choice 
for the dielectric function in terms of the electronic density. Moreover, its implementation in 
plane-wave (PW) ab-initio codes has been hindered by the necessity to interface such codes 
with an efficient, high-order, ideally parallel multigrid solver able to work in the arbitrary 
geometry of any Bravais lattice. 
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III. PRESENT MODEL 

We show now that, starting from the basic equations of Fattebert and Gygi , it is 

possible to recast the electrostatic problem in terms of a polarization density, similarly to 
what is done in PCM. The key assumption is that of a dielectric medium self-consistently 
modeled on the electronic density of the solute, via a suitably defined relationship 

e (r) = e (p^'^ (r)) , (1) 

such that the dielectric is excluded (e = 1) from the inner part of the solute, where the 
electronic density is high, while it smoothly goes to the bulk dielectric constant of the 
solvent (e = eo) outside the solute, where the electronic density goes to zero. By adding 
a dielectric medium to the system, the electrostatic field that enters into the quantum- 
mechanical problem is no longer given by the Poisson equation in vacuum 

V (r) = -47rp^°'"*'^ (r) , (2) 

where the total charge density in vacuum is the sum of the electronic and ionic densities 

^solute ^ pdec ^ ^.ons ^ (3) 

but has to be the solution of the more complex Poisson equation 

V ■ e (p"'^" (r)) V0*°* (r) = -47rp^°'"*" (r) (4) 
that is nothing but the standard Maxwell equation 

V ■ D (r) = 47rp^°'"*^ (r) , (5) 

for the displacement field 

D (r) = E (r) + 47rP (r) = e (r) E (r) , (6) 

where E (r) and P (r) are the electric field and the polarization. 

Eq. ([5]) can be transformed into an equation in terms of the electric field 

V ■ E (r) = 47rp'°'"*^ (r) - 47rV ■ P (r) (7) 

and, by mapping the effect of the dielectric into a polarization charge density 



(r) ^ -V ■ P (r) = V ■ I 11^!!^)LJv0*°* (r) | , (8) 



a vacuum-like Poisson problem is recovered 

V20*°* (r) = -47r (r) + p^"^ (r)) . (9) 

As in PCM, the nonlinear nature of the problem that arises from the mutual polarization 
of the solvent and solute is accounted for via a polarization charge density that depends on 
itself through the total electrostatic potential. 

By evaluating the gradient on the right hand side of Eq. ([8]) and performing some simple 
algebraic manipulations, the polarization charge density can be expressed alternatively as 
the sum of two distinct terms: 

(r) = ^ V In e (p'^'- (r)) ■ V^*"* (r) - ^^ (^el (,)) (r) • (10) 

As in the case of PCM, introducing polarization charges allows to deal with localized physical 
quantities. Indeed, both terms in the above equations are only defined in a narrow region 
around the solute. In particular, the first term contains all the nonlinear character of the 
problem, and is confined within the vacuum-solvent interface, where the dielectric function 
has a gradient different from zero. The second term, instead, is a simple rescaling of the 
solute charge density that extends into the vacuum region (see Ref. jssl] and references 
therein): although in principle defined in the whole three-dimensional space, this term is 
also localized around the solute owing to the exponential decay of the electronic charge 
density in the dielectric region. 

It should be noted that formulating the problem in terms of the polarization charge 
density instead of the solvent electrostatic potential allows - thanks to the Gauss theorem - 
to define a sum rule for the total polarization charge surrounding the solute, namely, 

ool f„\ j„ ^0 1 / ^solute 



ff°' (r) dr = / p""'"*^ (r) dr. (11) 

eo J 

Eventually, from the linearity of the Poisson equation Eq. ([9]), it follows that the total field 
can also be written as 

0*°* (r) = 0^°'"*^ (r) + (r) . (12) 

where the 0''°'"*^ or 0^°' fields are solutions of vacuum-like Poisson equations in terms of the 
psoiute ^oi c}^aj.gg densities. Such a formal separation of the potential is useful in order 
to express all of the important quantities that enter into quantum-mechanical simulations 
(total energy, potentials, forces, etc.) in terms of two contributions: one explicitly depending 
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on the solute charge density alone (identified, now and in the following, by the superscript 
solute) and one explicitly depending on the dielectric (identified by the superscript pol). 
The first term is analogous to a quantity computed in vacuum and it is readily provided by 
the simulation code adopted. On the other hand, the polarization contributions need to be 
explicitly added, in order to include the effect of the dielectric in the simulation. Although 
the method can be applied to different quantum-mechanical approaches, in the present 
article we will limit our discussion to the case of Density Functional Theory (DFT) in the 
Kohn-Sham formulation. The derivation of the polarization contributions to the electrostatic 
energy, Kohn-Sham energy, Kohn-Sham potential, and atomic forces is presented below. 



1. Electrostatic Energy 

Once the Poisson equation is solved and the polarization of the dielectric is known, the 
electrostatic energy of the system can be expressed as 

E*^' = ^ y E (r) • D (r) rfr = ^ J e (p^'"^ (r)) | V0*°* (r) | ^ dr. (13) 

By integrating by parts the last expression, the electrostatic energy can be expressed as 

E'^ = - [ p*°'"*'= (r) (f)'"' (r) dr. (14) 



2 „ 

The same expression can be obtained starting from the vacuum-like problem in Eq. (Q, by 
expressing the electrostatic energy as the sum of the total interaction energy of the charge 
densities of the system, both the solute and the polarization one, 

E^"* = IJ p*°* (r) 0*"* (r) dr = ^J (p'"'"*" (r) + (r)) 0*°* (r) dr (15) 

with the addition of the work done to polarize the dielectric. Such a work, assuming a linear 
behavior for the dielectric, can be shown to be 

W = ~ J P (r) ■ E {r)dr = -\ j P""' (r) 0*°* (r) dr (16) 

that, combined with Eq. OlSp . correctly provides the result in Eq. (1141) . 

Adopting the decomposition of the potential introduced in Eq. (fT2|) . the electrostatic 
energy of the system can be further written as 

j^el j^solute _j_ j^pol (17) 



where the first term is the electrostatic energy of the solute, including both electrons and 
ions, 

^solute = ^ y p^"'"*^ (r) 0"°'"*^ (r) dv, (18) 

and is the analogous of the electrostatic energy of the system in vacuum. Now, by exploiting 
the fact that 

the polarization term can be included in two equivalent ways: following PCM, as the integral 
of the polarization charge density times the potential of the solute charge densities acting on 
it, or, viceversa, as the integral of the solute charge density times the polarization potential 
acting on it 

EP"^ = - [ pP°^ (r) 0'*°^"*'= (r) dr (20) 



psolute ^pol (j.) ^21) 



2. Kohn-Sham Energy and Potential 

From the above equations, the total DFT energy of the system can be expressed as 

jptot [ ^elec Aonsl jpkin \ ^elec] < rpel [ ^elec Aonsl i jpxc [ ^elecl 
^ [P ,P \ = [P \ + ^ [P ) P J + [P \ 

jpkin \ ^elec] < jpsolute [ ^elec Aonsl i jpxc [ ^elec] i jppol [ ^elec Aonsl 

= h [p \+ h [p ,p \ + h [p \ + h"" [p ,p \ 

= {E'°' [p^'-, p--] ) ^^^^^^ + EP°^ [p^'-, p--] (22) 

where, in the last equality, we underline the fact that the quantity 

/ jptot [ ^elec Aons~\\ jpkin [ ^elecl i jpsolute [ ^elec Aonsl i jpxc [ ^elec] /'OQ^ 

[P \)solute = ^ [P [P .P [P \ [^i) 

does not depend explicitly on the polarization charge distribution and, thus, is analogous to 
the total Kohn-Sham energy of a system in vacuum. 

The effective DFT potential acting on the electrons can be written as (see, for example, 
Appendix A of Ref t2fi]). 



^^tot ^pClec pjonsj / ^J^tot ^pclec pionsj 



+ Vpol + V,, (24) 

solute 
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where the first term on the right-hand side includes all of the contributions to the potential 
that do not depend explicitly on the polarization charge density and is analogous to the 
potential of a system in vacuum. As for the two additional terms, the first one is a purely 
local potential contribution and is simply given by the electrostatic potential generated by 
the polarization charge density 

v,,i (r) = cjf"' (r) , (25) 



while the second term arises from the dependence of the dielectric on the electronic charge, 
as defined in Eq. ([1]), and is given by 



^'^ ^ V0*°*(r) . (26) 



l-K dp' 



elec 



3. Forces 

Exploiting the Helmann-Feynman theorem, the force fa acting on an atom a of the solute 
is given by 

.tot ^ _dE^ ^ _dE^ (07^ 
dRa dRa ^ ^ 

where is the position of atom a and we exploit the fact that the electronic kinetic energy 
and exchange-correlation potentials do not depend explicitly on nuclear positions. By using 
the expression for the electrostatic energy in Eq. ( fT3i) . the force is given by 



f toi _ d 1 



a 



dRa Stt 



1 

8^ 



J e(p^'^^ (r)) |V0*°*(r)|'rfr 
/ l^^"*Wr^^^^^^r+ /.(/-(r))^|V0-(r)rrfr 



(28) 



Provided that the dielectric permittivity is not an explicit function of the atomic positions 
of the solute, as is the case of the definition adopted in Eq. ([1]), the first contribution in the 

11 



above equation is equal to zero. The second contribution, instead, can be expressed as 

I. I , ^pelec (r)) y^tot . ^V0*"* (r) dv 

^ j 6 (p^'- (r)) V0*°* (r) ■ V^0*°* (r) dv 



-^V-e(p^'^^(r))V0*"* (r) 



9R„ 



/ (29) 



By applying the partial differential to the first formulation of the electrostatic energy in Eq. 
(fT3|) . an alternative equation for the force is obtained, namely 

which, combined with Eq. (l29l) . provides the useful relation 

|,-.(.)^?^,../,»(.)^?^giW,. (31) 

Similarly to the expressions for the electrostatic and the Kohn-Sham energies in Eqs. ( ITTll 
and ( 122]) . also the interatomic forces can be defined as sums of two terms 

* = - f p^°'"*^ (r) -l-f/,^"'"**^ (r) rfr - /" p'"^""'^ (r) tJ-'/)'"'' (r) rfr (32) 

= (fr).„,.,.+r. (33) 

The first contribution does not depend explicitly on the polarization charge density and is 
analogous to the interatomic force on a system in vacuum. By using Eq. (13T]) and similarly 
to what is done in Eq. ( fT9l) . the polarization contribution to interatomic forces can also be 
expressed as 

= - / (r) ^P^°'"*' (r) dv (34) 
= - I (r) -^/"'^'^ (r) dv, (35) 
where the partial derivative is now applied to analytic functions of nuclear positions. 
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When the dielectric depends instead exphcitly on atomic positions, Eq. (13T]) is no longer 
exact and additional contributions to forces would arise. This is the case, for example, when 
the dielectric constant is entirely defined in terms of a fictitious electronic density centered 



26|. It is important to notice that similar 



at the atomic positions, as described in Ref 
contributions to the forces arise also when the dielectric is defined in terms of the sum of the 
electronic density plus a fictitious ionic density, as is the case in, e.g., the original Fattebert 
and Gygi model, where additional core charges were added to saturate the dielectric constant 
in the solute region . In this contribution similar to the one derived in Ref. [26j 



should be explicitly added to the forces, unless the derivative of the dielectric with respect 
to this fictitious nuclear density is zero, i.e., unless this density is added only in a region of 
space where the resulting dielectric constant is flat. 

IV. CHOICE OF DIELECTRIC FUNCTION 

Although formally equivalent to the original Gygi-Fattebert model, the equations pre- 
sented above have the advantage of highlighting the numerical challenges of the original 
formulation. In particular, the main ingredient of the model is the dielectric function that 
appears in Eq. ([1]), and for the model to work properly and seamlessly some conditions on 
this function should be imposed. These conditions are as follows: 

1. Extrema: the dielectric function should go monotonically from a value of 1 (vacuum) 
inside the molecule to a value of eo in the bulk of the solvent: 

lim eiec^o e (p"''') = eo 

(36) 

limpeiec^^ e (p^'^^) = 1 

2. Flatness inside the solute: the dielectric should be exactly equal to one above a certain 
density threshold, to avoid spurious polarization effects due to the interaction of the 
dielectric with the ion cores. 

3. Flatness in bulk solvent: the dielectric should be exactly equal to eo below a certain 
threshold, to avoid spurious polarization charges in the bulk of the solvent, due to 
potential numerical noise in the exponentially vanishing electronic density away from 
the solute. 
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Figure 1. Numerical test. Left: single water molecule in a periodic cubic cell of (25 a.u.) ^ size. 
The isosurface of the density corresponding to a threshold value of 0.0003 a.u. is shown. Right: 
electronic density of the water molecule along the x axis passing through the center of the oxygen 
atom. 

4. Smoothness: since we are interested in implementing the model in a plane-wave, pe- 
riodic electronic-structure code, the dielectric function and its gradient have to be 
smooth enough to be well described in a three-dimensional grid that has a resolution 
given by the typical density cutoffs used in plane-wave DFT calculations. Such condi- 
tion is crucial to the convergence of the iterative SCF calculation. Moreover, since our 
formulation of the method relies on the polarization density of the medium, this later 
quantity should also be smooth enough to be well described with the density cutoffs 
that are conventionally adopted. 

The original dielectric function by Fattebert and Gygi 



satisfies the first requirement, going very smoothly from e = 1 in the proximity of the 
solute to e = eo in the bulk of the solvent. The smoothness of the function allows it to 
be well defined in the three-dimensional mesh used in typical calculations (see Figure (jj])). 
Nonetheless, convergence issues may arise in some particular systems. Such issues have been 
reported by Sanchez et al. [26] for simulations of two-dimensional systems (i.e. slabs), and 
result from the fact that the gradient of the dielectric function and the dielectric-dependent 




(37) 
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Figure 2. a) Dielectric function (re) and polarization charge (blue) of the water molecule of Figure 
([T|) plotted along the x axis, b) Polarization field [Eq. (|25p ] and additional self-consistent dielectric 
contribution to electronic Hamiltonian [Eq. (|26p ] for the same system. Physical quantities are 
computed using the original formulation by Fattebert and Gygi [Eq. (I37p ] with the parameters 
reported in Ref. [3]- The filled circles correspond to the points of the real space grid used in the 
calculation, corresponding to a density cutoff of 400 Ry. 



contribution to the potential in the energy functional are too abruptly varying to be correctly 
described with typical grid sizes. It is worth noting here that the extra potential term of 
Eq. (126|) is somewhat similar to the polarization charge density: both quantities depends on 
the gradient of the dielectric and of the total field, thus their are both confined in the small 
region around the solute and they both suffer for the same conditioning problems (compare, 
for example, the two quantities in Figure ([2])). Last, the smoothness of the Fattebert and 
Gygi function leads to a polarization density in a region very close to the nuclei, potentially 
breaking the second requirement. As mentioned in Section UTTl this problem has already been 
pointed out in the original paper by Fattebert and Gygi, and it has been solved by using 
an additional density in the definition of e (p{'r)), in order to force the dielectric constant to 
go to e = 1 in the proximity of the nuclei jl|, |2J]. While this approach is correct in theory. 



in practice it introduces some additional terms in the forces acting on the nuclei that have 
to be explicitly considered (see final discussion in the previous Section). Moreover, the final 
result of the calculation depends strongly on the choice of the introduced additional density 
or on the choice of parameters used to model the ionic density (i.e. shape of the nuclei, 
Gaussian spread, etc.). For these reasons it appears that the original function, at least in 

15 



the parametrization adopted in Ref is excessively smooth where it should not (close to 
the nuclei), while not smooth enough where it should (outside the molecule). 

With the aim of correcting these drawbacks, we decided to introduce a new expression 
for the dielectric function by using a piecewise definition of the dielectric of the form 



e 



^0 iPminiPmaa 



S 



(P^'^^) Pn.^n < P'''' < Pma. (38) 



Co p''^'''' < pr. 



where s (x) is a general smooth switching function that decreases monotonically from 
s {pmin) = Co to s (pmax) = 1- Several switching functions s (x) were considered, e.g. the 
trigonometric function 



s{x) = l ^° 



2n 



r, i^Pmax '^) . / n {,Pmax 

Ztt- ^ — sm Ztt- 



i^Pmax Pmin) \ i^Pmax P 



(39) 



Nonetheless, despite their similarity with the original function of Fattebert and Gygi, all 
of the trial functions resulted in even worse or no convergence of the electronic-structure 
calculation at any reasonable cutoffs. The reason for such a ill-conditioning is that the 
region where the s (p'^'^'^) function is defined lies outside of the molecule, where the electronic 
density decays exponentially. Thus, the resulting function s (r) also decays exponentially. 
Nevertheless, since the electronic density is known to vanish exponentially, the problematic 
analytical behavior can be rectified easily by redefining the switching function in terms of 
the logarithm of the density, namely. 



5' [p^^^^) = s(lnp'='"'=) = 1 + 



eo 
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(In Pmax - p''^'"') . (In p^ax - In p'^''^'^) 

ZTT^; : sm ZTT- 



{\y\. p^nax In Pmin) \ {}-^ Pmax In Pm: 

(40) 

In order to increase the smoothness of the dielectric function and of the resulting polar- 
ization charges, a further condition can be imposed on the form of the switching function. In 
particular, from Eq. ( ITOj) it is clear that the polarization charges are mostly given by a term 
of the form Vine (^p^^^^ (r)) ■ V0*°*"' (r). While no assumptions can be made, a priori, on the 
behavior of the gradient of the total field, in defining the dielectric function it is convenient 
to choose a form of the switching function such that the derivative of its logarithm is well 
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Figure 3. Behaviors of the dielectric function (left), its derivative (center) and the derivative of its 
logarithm (right) for the three different switching functions reported in Eq. (j39p . black, Eq. (j40p . 
red, and Eq. (02]), blue. 



behaved. This can be easily achieved by selecting a dielectric function of the form 



in ipmax \r ) 



1 



P ^ Pvnax 



exp (t (Inp'^'^'^)) pmin < p''^'"' < Pv 
^0 P'^'^'^ < Pmin 



(41) 



where t (x) is a general, smooth function that decreases monotonically from t (In pmin) = In 
to t (In Pmax) = 0. Explicitly, in this work, we decided to adopt the trigonometric function 



Ineo 
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i}^ Pmax Pmin) 



— sin 27r 
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(42) 



whose first derivative 



dt (x) 



In eo 



dx (In pr, 



In p„ 



1 — cos I 27r 



(In Pmax x) 
(In Pmax 111 Pmi 



(43) 



vanishes with zero slope at the extrema of the interval of definition, thus making e and 
its first two derivatives continuous in the whole space. It is important to notice that the 
proposed function does not contain any internal adjustable parameters, but only depends 
on the value of the dielectric constant and on the two density thresholds pmin and Pmax- 
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V. ITERATIVE VS MULTIGRID 



In order to effectively compute the polarization field in Eq. (jlj) and related polariza- 
tion charges, different numerical procedures can be adopted. While most of the previous 
implementations rely on multigrid solvers, we have found more advantageous to rely on an 
iterative procedure, derived along the lines of a similar approach first introduced in PCM. 

In particular, given the total charge density of the solute, the second term for polarization 
charge appearing in Eq. (fTOj) is readily obtained, while for the first one, that we label p**'^^; 

= i- V In e {p'^'' (r)) ■ V0*°*"' (r) , (44) 

the following procedure can be adopted: 

1. In order to avoid calculations of polarization effects for a system with a non converged 
electronic density, solvent effects are computed only when the accuracy of the SCF 
calculation reaches a given threshold value r^'"^. 

2. At the first iteration for the dielectric, the initial polarization charge is fixed to be 
equal to zero. All following calculations at each subsequent electronic SCF step use 
the polarization density from the previous step, namely 

tier _ g gj,g^ polarizatiou calculation 

(45) 

Po'^^ = Po;J oterwise 

3. The total density of the system (solute plus polarization) at iteration n is computed 



as 



pf (r) = p-'"*^ (r) + pf (r) (46) 



^psolute ^ ^ter ^ ^ ^\P^^ ^K^^ ^solute (47) 

= ,(^ei(,)) P^°'""W+P^W- (48) 

4. The gradient of the total field is computed in reciprocal space via one Fast Fourier 
Transform (FFT) and three inverse FFTs (IFFTs): 

(r) "^^^ (g) (49) 

V0*.°ti (g) = ^P'n (g) (50) 

V0*ti (g) (r) . (51) 
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5. The iterative part of the polarization charge at the (n + l)-th step is computed by 
using Eq. (jH]), namely 



ptf- (r) = V In 6 (r)) ■ V0*„ti (0 • (52) 

6. A linear mixing of the polarization at the (n + l)-th and n-th steps is performed, in 
order to stabilize the iterative procedure: 

f^uTi (r) := W^Ti (r) + {I ' v) (r) , (53) 
with 7] a given mixing parameter (usually 7] ^ 0.6), and the residual p^'^^ (r) is computed 

r (r) = p^*- (r) - pf ^ (r) . (54) 

7. The polarization density is converged when 

< r^"\ (55) 

with rP°' a given tolerance, and the iterative procedure is stopped. 

Typical results for the polarization density of neutral molecules in water are reported in 
Figure 1^. 

For numerical stability, we find important to compute Vine (p^'^^ (r)) in real space, e.g. 
by using finite differences. While computing such a gradient via FFTs would yield a truly 
variational energy expression, going through reciprocal space would unavoidably give rise to 
spurious polarization charges in "forbidden regions". In particular, polarization charges close 
to the nuclei would form, thereby severely compromising the convergence of our procedure. 
Several real-space finite-differences schemes were tested, with the simplest 3-points central 
differences algorithm providing converged results for the majority of tests and simulations. 
Nonetheless, for microcanonical molecular dynamics simulations, total energy conservation 
was found to strongly depend on the order of the finite-difference scheme adopted, with only 
high-order algorithms providing the necessary degree of accuracy in the interatomic forces 
(see the discussion in subsection IVII B"3l) . 



Similarly to standard SCF calculations, convergence of iterative calculations depends 
crucially on the use of a mixing scheme. Different mixing schemes can be adopted, such 
as the Anderson mixing 36||, the modified Broyden mixing 37| or the Direct Inversion in 
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Figure 4. Positive (orange) and negative (blue) polarization charges of a water molecule, on the 
left, and of a piperazine molecule, on the right (molecule 013 of the training set, structure in 
Figure ([6|)). Brighter (darker) colors correspond to values of the density around ±0.03 a.u. (±0.01 
a.u.). Stronger polarization charge densities are found around the lone pairs of electrons of the 
etheroatoms (oxygen in red and nitrogen in blue) and on the more polarized hydrogen atoms, such 
as the ones attached to the etheroatoms. 



the Iterative Subspace (DIIS) 38|]. In order to reduce the number of numerical parameters 
involved in the calculations, in this article only results obtained with a simple linear mixing 
are reported. Apart from enhancing convergence, no significant effects on the final results 
exist for different mixing parameters in the range 0.2 < rj < 0.6. 

The total number of polarization iterations required is strictly related to the tolerance 
^poi jjjiposed on the residuals. A large value of r^"' in Eq. (155|) would allow the polarization 
procedure to stop in a few iterations. Nonetheless, poorly converged polarization screening 
does prevent the electrons from reaching their ground state, requiring a much higher number 
of SCF steps to converge (see top and middle panels of Figure (|5])). 

Since a single SCF step is much more computationally expensive than a polarization 
iteration, which only involves four FFTs, it has been found preferable to impose a tight con- 
vergence tolerance on the polarization charge (bottom panel of Figure ([5])). More advanced 
schemes that rely on a tolerance that depends on the convergence of the electronic density, 
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Figure 5. Effects of tlie tolerance on the convergence of polarization charges r^"', Eq. (j55p . Top: 
when the iterative tolerance is set to a large value, only one polarization iteration per SCF cycle is 
needed. The polarization accuracy increases in unison with the SCF cycles, but convergence of the 
electronic density is slowed down. Bottom: for a small value of the iterative tolerance (i.e. tight 
convergence in the dielectric cycle), many polarization iterations are needed for each SCF cycle. 
Global convergence to the electronic ground state is not affected by the presence of the solvent, or 
actually it is improved with respect to the simulation in vacuum. Middle: intermediate case. 
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have been tested. Moreover, a scheme could be envisaged where the polarization density and 
the electronic density are optimized with the same common iterative algorithm, i.e., where 
the polarization density is mixed once per SCF cycle and with the same procedure used for 
the electronic density. Nonetheless, for sake of simplicity, only results obtained with a fixed 
tolerance on the polarization charge are presented. 

The behavior of the SCF procedure is also strictly dependent on the choice of the thresh- 
old value for the onset of the polarization calculation, r^'-'^ . Waiting for the electrons to be 
converged before starting the polarization calculation could seem an efficient way of proceed- 
ing. Nonetheless, switching on the polarization fields strongly affects the SCF procedure, 
bringing the electrons as far from their ground state as at first SCF steps. For this rea- 
son, larger values of t^^^ , of the order of 0.1-0.5 Ry, were found to give smoother SCF 
convergence, with no effects on the final results. 

Overall, the whole procedure benefits significantly from the advantageous scaling of FFT 
techniques. Contrary to multigrid solvers, FFTs are easily available in fast, well established, 
parallel libraries and form already a key ingredient in most plane-wave electronic-structure 
codes. Moreover, the method relies on fully periodic boundary conditions, so it can be 
straightforwardly applied to periodic systems in arbitrary unit-cell geometries without re- 
quiring a cubic cell and imposing arbitrary Dirichlet boundary conditions on the potential, 
as in the multigrid cases. In polar solvents, periodicity does not affect final results, since 
the solvent screens very effectively the charge density of the solute. Moreover, periodic im- 
age correction schemes, such as the Makov-Payne correction [39] or countercharge methods 
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4l| . can easily be adopted to include the polarization charge density, by treating it at 



the same level as the molecular charge distribution. 

VI. ADDITIONAL SOLVATION TERMS 

The model described in the previous sections focuses only on electrostatic contributions 
to solvation Q 

AG"' = G"' - G°, (56) 

where G*' = (-E'*°*)t,acu«m ab-initio energy of the isolated solute in vacuum and G*^' is the 
analogous quantity computed in solution, i.e. from Eq. (122|) . Such a contribution is always 
negative, i.e., allowing the medium to polarize always stabilizes the solute and lowers its 
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total energy. Of course, other effects than electrostatic are present in any solvation process 
and they play a crucial role in balancing the overall solvation energies that can be negative 
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43|, PCM introduced other 



or positive. According to the formal definitions of Ben-Nairn 
non-electrostatic terms in the solute Hamiltonian, still assuming a continuum approach in 
the description of the medium. The main terms can be divided (see Eq. (79) of Ref. |3j) 
into: 

AG""' = AG"' + G™'' + G'^'p + G^'' + AG*"" + PAK. (57) 

In the above formula, the cavitation energy G'^"''" corresponds to the energy necessary to build 
into the solvent the cavity containing the solute. The repulsion G*"*^^ and dispersion G*^** 
terms are the continuum equivalent of the nonbonded short-range interactions generated 
by the Pauli exclusion principle and by van der Waals interactions. The thermal motion 
contribution G*"* arises from the change in the vibrational and rotational properties of the 
solvated system with respect to the isolated one, and the pressure term P/W takes into 
account the change in volume of the solvated system. Different models exist for each of 
the non-electrostatic terms, a comprehensive review being given in Ref. In standard 
PCM calculations only the first three non electrostatic terms are explicitly accounted for. 
Of these, cavitation and repulsion energies are positive by definition, while the dispersion 
contribution is always negative. Overall, these terms tend to cancel each other and their 
effect on molecular properties and chemical reactions has been generally regarded as less 
important than the electrostatic term. 



The model by Fattebert and Gygi was extended by Scherlis et al. [2| to include a cavita- 
tion term, simply expressed as the product of the experimental surface tension of the solvent 
7 times the surface S of the solute cavity, namely 

G™'' = 75. (58) 

( — I 

The surface above is the "quantum surface" introduced in |3| and defined, via the "quantum 

n 

volume" [3j, by finite difference between two isosurfaces of the electronic density as 

S = jdr (/'- (r)) - (r)) } x t^^. (59) 

From the functional derivative of the cavitation energy (Eq. (158|) ) with respect to the charge 
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density, an extra potential term of the form 



8p' 



elec 



(r)) 



(/'^^ (r)) 



X 



EE- 



dip (r) djp (r) didjp (r 



|Vp' 



elec 



-E 



2 _e(ec 



elec 



(60) 



can be straightforwardly added to the Kohn-Sham Hamiltonian. The function entering 
in the above equations was originally designed to be a step function, switching from zero to 
one at the threshold po^ namely. 



^Po (/^^^ (r)) = ^ (/'^^ (r) - po) 



p^'ec (^^) _ > 

1 p^'^'^ (r) - Po < 



(61) 



To improve the numerical stability of the algorithm, a smoothed step function of the same 
form, i.e. (p*^'^^ (r) — pg) , was also suggested. The finite difference parameter A was then 



used in Eq. 0591) to determine two adjacent isosurfaces separated by a constant separation 
in terms of electronic density. 

Subsequently, in order to use a definition consistent with the dielectric function used in 
the electrostatic solvation calculation, the function 



^Po,/3 (r)) 



(p^'-(r) /po) 



2/3 



1 



1 



1 



(62) 



(p^'- (r) /p,f' 

was adopted. We note that, while very similar in spirit to the original formulation by 
Cococcioni et. al. [3], the use of Eq. ( 162|) together with Eq. ( l59l) is not formally exact, 
since the parameter po entering in Eq. 0621) is not in a linear relationship with the argument 
of the switching function. Thus, applying the finite difference procedure to this arbitrary 
parameter as in Eq. (159|) does not correspond to calculating the spatial distance between the 
two isosurfaces. For this reason, surfaces computed with Eq. (159|) result in an overestimate 
of approximately a factor of 1.2 of the correct quantum surface. Such an overestimate can be 
avoided by applying the finite difference procedure directly to the argument of the switching 
function, namely. 



S 



dr \ ^{^y { p^'- (r) - I] - ^{^1 Tp^'- (r) + | 



X 



A 



(63) 



where now the 'd^a} (p^'^^ (r)) function is a given switching function that depends on an 
arbitrary set of parameters {a} and that goes from zero to one at a fixed threshold larger 
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than A/2. We notice here that, while the above expression is vahd for any kind of switching 
function, for a function defined according to the original formulation of Cococcioni et 
al. [Eq. (16T]) ]. Eqs. (159|) and ( !63|) are equivalent. For the sake of consistency with the 
electrostatic solvation term the following definition is adopted 

"pmin,pmax \^ I ) — 1 ) 

eo — i 

where e,o,p„^„,p„,, [p^^^" (r)) is now given by Eq. gl]). 

In a similar way, the method of Cococcioni et al. to treat systems under pressure can 
be immediately extended to the calculation of the PAV term that appears in Eq. (1571) . 
Explicitly, by adopting the same switching function used to define the solute surface Eq. 
the solute volume can be expressed as 

V = j drd {p''^' (r)) , (65) 

that gives rise to an extra potential term in the Kohn-Sham Hamiltonian of the form 

= (^^^ 

By including the PV term in the total energies of both the system in vacuum and in solution, 
the PAV contribution is automatically included in the solvation free energy computed via 
Eq. Such a contribution can be important for systems under pressure while, for 

systems at standard pressures, it can be safely neglected. 

As for the remaining contributions to the solvation free energy, we have decided to treat 
them in a simplified way, their explicit modeling being the subject of future developments. 
In particular, similarly to other models of solvation, the thermal motion contribution has 
been neglected, while we express the sum of dispersion and repulsion free energies as a term 
linearly proportional to the quantum surface and the quantum volume of the molecular 
cavity, namely 

^rep ^ Qdis = ^g^ f^Y^ (67) 

where the two factors a and /3 are solvent-specific tunable parameters that can be fitted, 
together with the other parameters in the model, e.g. to reproduce total solvation energies. 
This approach is coherent, for example, with the definition of repulsion energy used in PCM 
where such a term is proportional to the solute electronic density that lies outside the 
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molecular cavity: in a model where the molecular cavity is defined on an isosurface of the 
density, the amount of electronic charge outside the cavity will be proportional to the surface 
of the cavity. Moreover, such an approach is similar in spirit, but with less parameters, to the 
one adopted by Cramer and Truhlar in their Generalized Born (GB) method of solvation 
and, more recently, in their SMx models (see Ref. [45] and references therein). In these 
classes of methods, all of the non-electrostatic terms of solvation are described as the sum of 
atomic terms, proportional to the solvent-accessible surface area of each atom of the solute. 
The final expression for the solvation free energy of our simplified model is thus given by 

AG'"' = AG'' (eo, p™„, Pma.) + {a + j)S + PV, (68) 

where the main parameters involved in the electrostatic contribution are explicitly reported. 



VII. RESULTS 



A. Numerical Details 



A large set of experimental solvation free energies for 240 small neutral organic molecules 
in water was used to test the proposed computational methodology. The molecules of the 
set have been collected in Ref. |4Q] to test the accuracy of classical molecular dynamics free 
energy perturbation calculations. Based on hierarchical clustering, a smaller representative 
set of 13 molecules (reported in Figure [6]) spanning the main functional groups of organic 



molecules is also provided in Ref. |46| and is used here to test the convergence of numerical 
parameters and to fit tunable parameters. 

PCM calculations were performed at the Hartree-Fock (HF) and density functional the- 
ory (DFT) level, using both the G03 ^\ and G09 |49| versions of the GAUSSIAN code. For 
the sake of brevity, in the remaining discussions the labels g03/g09 will be used to refer to 



the two different flavors of PCM 
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29| implemented as defaults in the G03/G09 versions 
of the GAUSSIAN code. Both standard PBE and hybrid B3LYP exchange-correlation func- 
tionals were tested. The dependence of the results on the chosen theoretical approximations 
was found to be minor and generally negligible compared to the overall agreement with 
experimental results. Thus, only results obtained with the PBE functional will be reported. 
Geometries were optimized both in vacuum and in solution with a double zeta 6-31g(d) basis 
set. Starting from the relaxed geometries, total energies were computed with a triple zeta 
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Figure 6. Fitting set: chemical structures and labels of the thirteen small, neutral, organic molecules 
used to fit the solvation model and to test the accuracy of the model with respect to numerical 



parameters. Names, solvation energies anc 



in Tables in the Supplementary Material 



computed results for the molecules in the set are reported 
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6-311+g(d,p) basis set. Due to the lack of such basis sets for the iodine atom, calculations 
with GAUSSIAN were not performed on the eight molecules of the set containing this atomic 
species. For the other molecules, calculations were performed using PCM to compute both 
electrostatic and cavitation energies. For the latter contribution and in order to be coher- 
ent with the original definition of the cavitation term, calculations were performed with a 



cavity built using unsealed Bondi's atomic radii 



50] . For all other calculations the default 



cavities were used. In particular, lEF-PCM as implemented in G03 adopts a cavity defined 



according to the solvent excluded surface 
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and based on spheres of United Atom radii 



G09, instead, relies on a simpler cavity 



as approximated by the GePol algorithm 



52|, 



53|. The version of lEF-PCM implemented in 



29|, built as the combination of atom centered van 



54| . and scaled by a factor of 1.1. 



der Waals spheres of Universal Force-Field radii 

Our continuum solvation model has been implemented in the public-domain PWSCF 
parallel code included in the Quantum-ESPRESSO package [55^, based on DFT, periodic- 
boundary conditions, plane-wave basis sets and pseudopotentials (PP) to represent the 
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ion-electron interactions. All the calculations reported in this work were performed at 
the Gamma point of the Brillouin zone (as is appropriate for molecules), using the PBE 
exchange-correlation functional and Vanderbilt ultrasoft pseudopotentials (USPP), as con- 
tained in the PSlibrary of A. Dal Corso [56]. For bromine and iodine we have used the 
USPP available online [57]. Kohn-Sham wavefunctions and charge densities were expanded 
in plane waves up to kinetic energy cutoffs of 30 and 300 Ry, respectively. For molecules 
containing fluorine, these cutoffs were further raised to 45 and 450 Ry. Simulation cells were 
chosen to be at least 15.0 a.u. larger than the maximum size of the molecule in vacuum, 
and not smaller than 20.0 a.u.. Spurious periodic-image effects were taken into account, for 
all the molecules, using the Makov-Payne corrective scheme 39] . 



In order to ensure a consistent description of solvent effects across our large set of sim- 
ulations used to fit the tunable parameters of the method, a larger basis set was adopted, 
corresponding to wave-function and density cutoffs of 40 and 400 Ry, respectively. Moreover, 
since the effect of the tunable parameters on the final geometries of the benchmark molecules 
was found to be of secondary importance (less than 0.1 kcal/mol), the fitting simulations 
were performed with no geometry relaxation and starting from molecular geometries opti- 
mized using a reasonable set of parameters (p"^"^' = 0.008 a.u., p™"- = 0.00015 a.u., 7 = 72 
dyn/cm, a = dyn/cm, /3 = GPa). 

The accuracy of the method and, in particular, of the forces calculation were tested by 
means of Born-Oppeneimer ab-initio molecular dynamics simulations of a prototypical sys- 
tem in vacuum and in solution. Equilibration of the system was first performed with a 
timestep of 60 a.u. (~ 1.45 fs) in the NVT ensemble at 300 K, as imposed by a Berendsen 
thermostat. Following equilibration at constant temperature, simulations were performed 
in the microcanonical ensemble. In order to ensure proper energy conservation for the sim- 
ulation in vacuum, some key simulations parameters had to be tightened. In particular, 
the wavefunction and density cutoffs were raised to 40 Ry and 600 Ry, respectively. Sim- 
ilarly, the timestep was decreased to 20 a.u. (^ 0.48 fs) and the threshold controlling the 
convergence of the electronic SCF procedure was imposed to be 10^^^ Ry. The same set of 
parameters was then used for the simulations in the presence of the solvent. 

In order to study the influence of the numerical parameters of the method on the final 
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results, unrelaxed electrostatic solvation energies were calculated as 



where, contrary to Eq. (l56|) . the electrostatic energy of the system in solution is computed 
using the geometry optimized in vacuum (G"^''*'). This was done with the aim of removing a 
further source of errors from the analysis of the numerical sensitivity of the method. 



B. Parametrization 

As summarized in Eq. fl68p . our approach involves six main parameters. Of these, the 
bulk dielectric constant eo and the surface tension 7 represent physical quantities, specific 
of the solvent, that can be computed from first principles or, more often, extracted from 
experimental results. The remaining four parameters, i.e. the two thresholds entering into 
the definition of the dielectric function and the a and /3 constants of Eq. ( l67l) . are tunable 
quantities that affect the computed solvation energies. As such, these parameters have 
to be fitted on reference calculations or experimental data, as reported in the following 
subsection (IVIIB It is important to underline here that the comparisons with PCM are 
not intended to represent a direct assessment of the accuracy of PCM or of other continuum 
models in the literature, but rather they serve two other purposes. First, they are intended 
to show the intrinsic flexibility of the SCCS formulation, which can reproduce well results 
of other approaches exploiting a much reduced number of parameters. Second, using well 
established continuum models simplifies the fitting procedure in comparison to relying solely 
on experimental data. For the above reasons and due to their wide spread use and assessment 
in the literature, comparisons and fittings are only made with the two main versions of lEF- 



.tionp 

m 
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PCM, as implemented in the Gaussian simulation packages j29. |58| . We are aware t 



hat 



and some of these 
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other continuum models exist in the literature 
could also be fitted within the SCCS formulation; nonetheless such extensive comparison is 
beyond the purpose of this article. 

In addition to the parameters reported in Eq. ( l68i) . the proposed methodologies rely 
on a certain number of computational parameters. These are purely numerical quantities, 
that influence the speed and stability of the calculations but, apart from pathological cases, 
should not affect the calculated results. These parameters include the standard parameters 
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of plane-wave periodic boundary calculations (wavefunction and density cutoffs, cell size, 
pseudopotentials) together with some solvent specific quantities, such as the tolerances of 
the iterative approach r^^^ and r^°', the parameter r] entering into the polarization density 
mixing approach, the finite difference parameter A used in the calculation of the molecular 
surface, and the radii of the fictitious ionic densities used in the Ewald evaluation of electro- 
static contributions. While, in theory, these parameters can be freely adjusted to make the 
method more efficient and more stable, a careful analysis of their effects on the final results 
is mandatory and is reported in Appendix A (IVIIip . 



1. Fitting of tunable parameters 

In particular, since the proposed model mostly focuses on the electrostatic contribution 
to solvation free energies, it is difficult and even not physical to fit tunable parameters of 
the model to reproduce total solvation free energies of molecules. Indeed, such a fit would 
intrinsically rely on cancellation of errors and would be highly dependent on the solutes 
and solvents considered. Furthermore, it is mostly the electrostatic part of solvation that 
plays a role in determining the molecular properties of a solute in solution, such as solvent 
effects on molecular spectra (IR, UV or NMR) or on reaction rates. An ideal fitting would 
thus require parametrizing the model on these molecular properties. Since already existing 
continuum models of solvation have generally achieved a very good description of molecular 
properties in solution, and the lEF-PCM predictions for electrostatic solvation energies have 
been successfully used by several methods 



45 



60 



-|62| as the basis onto which one could 



parametrize more complete models of solvation, we decided to use the electrostatic part of 
the results obtained with the PCM to parametrize the electrostatic part of our model, while 
the non-electrostatic terms have been tuned in a second stage in order to improve the fit 
with respect to experimental total solvation free energies. 

The two main parameters of the electrostatic part of SCCS to be fitted are the two 
density thresholds p™"- and p^'^^ that enter into the definition of the dielectric constant [Eq. 
( HTl) . In particular, p™'*^ defines the onset of the dielectric and, for the flatness conditions 
(see Section IIVI) to be fulfilled, should correspond to a density isosurface as far from the 
nuclei as possible. Thus we consider only values lower than 0.005 a.u. for p^"-^ , As for 
ptnin^ since the electronic density in plane- wave codes is computed via FFTs and can present 
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Figure 7. a) SCCS mean absolute error with respect to PCM-G09 electrostatic solvation energies 
for the 13 molecules of the training set. b) Comparison of SCCS electrostatic predictions with 
PCM-G09 predictions for the 240 molecules of the full set. 



small oscillations also in regions of space far from the molecule, it is important to define this 
threshold such that it is not influenced by numerical noise in the electronic density. For this 
reason only values of p™*" higher than 0.00005 a.u. have been considered in the fit. 

Mean absolute errors (MAE) relative to PCM electrostatic solvation energies as a function 
of p™"- and p™"^ are reported in Figures ([7]) and ([8]), for G09 and G03 respectively, and in 



the Supplementary Material j47l |. From both graphs, it appears that there is no unique 
optimal fit for the two parameters p™"- and p™"^. In fact, for each value of p™" there exists 
a value of p™'"'^ that provides the optimal fit with respect to the reference. This observation 
offers some additional freedom in the choice of the two parameters. 

The agreement with the G09 results is excellent with a mean absolute error (MAE) lower 
than 0.4 kcal/mol on the 13 benchmark molecules of the training set (see Figure (ITh'))- 
Out of the possible choices of the two parameters, the values of p™" = 0.0001 a.u. and 
pmax _ Q QQ]^5 ^ wcrc Selected (fitg09), giving a minimal MAE of 0.36 kcal/mol for the 13 
molecules of the training set. The MAE is found to further decrease to 0.27 kcal/mol for the 
full set of 240 molecules reflecting the remarkable transferability of the fit performed on only 
13 benchmark molecules. This results is made more remarkable by the fact that our model 
relies on only two parameters for the definition of the cavity hosting the solute, at variance 
with PCM that involves in its simplest implementation a different empirical parameter for 
each atomic species. 
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Figure 8. a) SCCS mean absolute error with respect to PCM-G03 electrostatic solvation energies 
for the 13 molecules of the training set. b) Comparison of SCCS electrostatic predictions with 
PCM-G03 predictions for the 240 molecules of the full set. 

Parametrizing the method on the electrostatic solvation energies obtained with the G03 
version of the PCM model resulted in a less close agreement, with MAE of the order of 1.0 
kcal/mol (see Figure (Ek)); again, also in this case, results improve when the full set of 240 
neutral molecules is considered. In particular, using p™" = 0.0001 a.u. and p^""^ = 0.0050 
a.u. (fitgOS), an average error of 0.95 kcal/mol is obtained (see Figure (Eh))- By comparing 
Figures ([7^) and ([8^), it is important to notice that, for each value of p™"^ the value 
of p"^"-^ that gives the best fit of the PCM results in the G03 case is always higher than 
the corresponding G09 fit. This corresponds to a cavity that is closer to the solute, and 
is a behavior to be expected. The reason for such a result lies in the different definition 
of the molecular cavity in the two version of the model, the G09 version relying, in the 
default implementation, on a >a,e. .cute cavitvH,S„nUa.y, the reason for the less good 
agreement between SCCS and PCM G03 results is most likely due to the more elaborate 
definition of the molecular cavity in PCM G03, where an additional set of spheres was added 
to the cavity to smooth out intersecting regions. 

As a further test of the flexibility of the proposed SCCS model, a comparison of the 
cavitation energies computed with Eq. ( l58l) versus that of the Claverie-Pierotti method 
implemented in PCM was performed and reported in Figure ([9]). Also in this case, even 
though the two methods are based on different physical assumptions, results show a remark- 
able agreement. Errors of the order of 0.8 kcal/mol were found on the computed cavitation 
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Figure 9. a) SCCS mean absolute error with respect to cavitation energies computed by PCM- 
G03 for the 13 molecules of the training set. b) Comparison of SCCS cavitation predictions with 
PCM-G03 predictions for the 240 molecules of the full set, using the cavity parameters optimized 
to reproduce the electrostatic solvation energy of PCM-G09. The value of 7 is set to be equal to 
72 dyn/cm. 

energies, despite the fact that they are typically four times larger than solvation energies. 
Calculations of C^"""" on the whole set of molecules were performed using the experimental 
value of 7 = 72 dyn/cm for liquid water at room temperature and the fitg09 set of parame- 
ters, since these set lie close to the region of best match with the PCM cavitation energies 
(see Figure ([9])). 

Eventually, while keeping the two parameters of the cavity fixed to either the fitg09 or 
fitgOS values, and assuming a surface tension 7 = 72 dyn/cm for water at room temperature, 
the parameters a and /3 entering into the definition of the non-electrostatic terms, Eq. (167|) , 
have been optimized to best reproduce the experimental results of solvation free energies 
(see Figures and (fTT]) ). 

In both cases, an almost perfectly linear behavior was found for the error as a function of 
the two parameters. This is probably due to the small variation in the size of the molecules 
considered in the training set, which makes the volume and the surface terms to be linearly 
related (see Figure (fT2|) ). When looking at the behavior of the quantum volume with respect 
to the quantum surface of the solute for the 240 molecules of the full set, we found a trend 
in between the one expected for an ideally spherical system, for which 



V oc 53/2, 



(70) 
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Figure 10. a) SCCS mean absolute error with respect to experiments for the 13 molecules of the 
training set. b) Comparison of SCCS solvation energies with experimental solvation energies for the 
240 molecules of the full set. In both graphs, the electrostatic parameters of the SCCS are fitted 
on G09 (p™" = 0.0001 a.u., = 0.0015 a.u.). Results in b) are obtained with the fitg09 set of 

parameters (see Table (jl])). 




Figure 11. a) SCCS mean absolute error with respect to experiments for the 13 molecules of the 
training set. b) Comparison of SCCS solvation energies with experimental solvation energies for the 
240 molecules of the full set. In both graphs, the electrostatic parameters of the SCCS are fitted 
on G03 (p""*" = 0.0001 a.u., p'"''^ = 0.0050 a.u.). Results in b) are obtained with the fitg03 set of 
parameters (see Table (jl])). 

and a linear system, composed by a cylinder of length /, terminated by hemispheres, and of 
radius equal to an average atomic size (r = 4 a.u.), for which 

V (xS. (71) 
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Figure 12. Relation between the quantum surfaces and the quantum volumes of the 240 molecules 
of the full set, computed using SCCS with the fitg03 set of parameters. For comparison, the volume 
vs surface trends are reported for the cases of a spherical system (in red) and of a cylindrical system 
terminated by hemispheres and of fixed base radius equal to 4 a.u. (in green). 



As a result of this almost linear behavior that reflects the corrugation of the solvation 
shell, we are again left with the freedom of choosing one of the two parameters, while 
performing an optimal fit on the other. In an effort to simplify the method and conform 
to other solvation models in the literature, we decided to choose the sum of the surface 
parameters a and 7 such that the volume contribution vanishes, i.e. in such a manner that 
the corresponding optimal value of /3 is equal to zero. This choice corresponds to a value of 
a + 7 = 2.5 dyn/cm for the fitg09 set of parameters, or a value of a + 7 = 11.5 dyn/cm in 
the fitg03 case. Total solvation energies for the 13-molecules training set with these choices 
of parameters resulted in a MAE with respect to experimental results of about 1.6 kcal/mol 



[see Tables in the Supplementary Material 



47|). When considering the whole set of neutral 



molecules, it is clear that the fitg03 set of parameters presents the best trend with respect 
to experimental data. This results in a better agreement with experiment for the fitg03 set 
compared to the fitg09, with MAEs of 1.31 kcal/mol and 1.53 kcal/mol respectively. 

Other fits were performed and tested on the full set of molecules, to check the consistency 
of the results along different combinations of parameters (see Table ([T])). It is instructive to 
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Table I. Mean absolute error on the total set of molecules for different sets of parameters. For each 
set of parameters, the solvation energy of water in water (experimental value of -6.30 kcal/mol) has 
also been reported. 



compare two choices of parameters that best reproduce the electrostatic energies computed 
with G03, namely the fitg03 and fitgOS' sets in Table ([T]). In this case, it appears that MAEs 
of comparable magnitude on the 13-molecule training set (see Figure ([8])) correspond to 
similar errors on the full set of molecules, thus further validating the choice of the fitting 
set. Moreover, it also appears that relaxing the condition on the second non-electrostatic 
parameter /3 has minor effects on the overall ability of the method to reproduce experimental 
energies, with MAE improving by 0.02-0.04 kcal/mol for two of the sets considered. In the 
third set considered (fitgOS) an improvement on the quality of the fit of the order of 0.11 
kcal/mol was found. Although not explicitly parametrized for it, this last set of parameters 
(fitg03+/3) is also the one that better reproduces the solvation energy of water in water, 
with an excellent agreement with the experimental value of -6.30 kcal/mol (see Table ([T])). 
Nonetheless, the relative improvement in the accuracy of the results, obtained by relaxing 
the condition on the f3 parameter, is still marginal compared to the overall performances of 
the method. 

Indeed, reported errors show that the proposed method is not yet able to be used as 
a tool to quantitatively predict free energies of solvation. Nonetheless, when compared 
with similar computational methods, the above results are remarkable, since they have been 
obtained with a model based on only two independent parameters. In particular, while the 
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Figure 13. Error on solvation energies, computed with SCCS and the fitg03+/3 set of parameters, 
for the extended set of molecules, divided according to their main functional groups (branched 
— branched alkanes; acids — carboxylic acids; cyclo — cycloalkanes; S2 = disulfides; multi — 
multiple halogens). Some compounds with more than one functional groups have not been explicitly 
classified. Colored rectangles have been used to identify the different classes of compounds, with 
different colors reflecting the degree of accuracy of the method: green with maximum errors less 
than ±1.5 kcal/mol; blue for maximum errors less than ±3.0 kcal/mol; red for sets with larger 
errors (up to ±6.0 kcal/mol) 



model of Cramer and Truhlar is able to achieve a MAE of the order of 0.55 kcal/mol for 
solvation of neutral compounds in aqueous solution 45|, it relies on a much higher number 



of tunable parameters. Similarly, the use of more tunable methods to describe the dispersion 
and repulsion contributions were shown to improve the agreement with experiment for other 
lEF-PCM based methods, with MAE of 0.58 kcal/mol and 1.01 kcal/mol reported for the 
COSMOtherm and the MST methods (see Ref. |62| and references therein). 



2. Errors vs functional groups 

In order to better understand which physical aspects mostly limit the accuracy of the 
method, the errors on solvation energies, computed with the fitg03+/3 parameters have been 
reported in Figure (fT3|) for all the 240 molecules of the extended set classified according 
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to their main functional groups. The performances of the method appear to be very good 
for a large set of organic molecules, including linear and cyclic alkanes, alkenes, arenes, 
aldehydes, ketones, and esters. With the noticeable exception of fluorine, etheroatoms and 
halogens do not affect the accuracy of the results, and compounds containing chlorine, 
bromine, iodine, mono- and di- sulfides, nitro and nitrile groups show reasonable errors 
(the largest deviations being less than 1.0 kcal/mol), well within the accuracy of the best 
implicit solvation computational methods. Considering only the above classes of compounds, 
a moderate MAE of 0.46 kcal/mol is obtained. 

Instead, the performance of the method is poor for two specific types of functional groups: 
amines and carboxylic acids, that show positive (amines) and negative (acids) errors on sol- 
vation energies of up to 4-5 kcal/mol. The acid and basic natures of these two classes of 
compounds is probably the cause of large discrepancies between computed and experimen- 
tal results. In particular, SCCS completely neglects the possibility that different species of 
the same solute could be in equilibrium in solution, corresponding to more or less proto- 
nation/deprotonation. Thermodynamically meaningful solvation energies should instead be 
computed as a weighted average of the solvation energy of the neutral species and that of the 
protonated/deprotonated species. A refinement of the fit to include the effects of acid-base 
equilibria will be the subject of further developments. 

In addition to the acid/basic compounds, poor results are obtained also for ethers, alco- 
hols and amides. Whether this is due to the lack of a correct description of the hydrogen 
bonding environment with a continuum model is still under study. Nonetheless, the bet- 
ter performance of alcohols with respect to ethers seem to suggest that hydrogen bonding 
should not be the only reason for such discrepancies. Fluorine compounds, and in particular 
multifluorinated alcohols and ethers, also show poor results compared to experiment. This 
effect could be again due to the lack of explicit hydrogen bonding in SCCS, that should 
affect fluorinated compounds more than other halogenated molecules. Moreover, the dis- 
crepancies in the computed results could be due to the different numerical treatment given 
to molecules containing fluorine, where higher density and wavefunction cutoffs were used in 
order to compensate for the hardness of the fluorine pseudopotentials. Eventually, alkynes 
and branched alkanes show somewhat poor agreement with experiment, probably due to the 
unideal description of the cavitation and repulsion potentials. 
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3. Molecular dynamics simulations 



In order to validate the accuracy of the proposed method for MD simulations, a prototyp- 
ical system composed by a cyclic tetramer of deuterated water molecules has been studied, 
similarly to what was done in Ref. As reported in the numerical details subsection 

IVII A| convergence of constant energy simulations in vacuum, as reflected by total energy 
conservation, required to sensibly tighten some of the key parameters of the simulation. 
When the same parameters of the calculations in vacuum were adopted for a simulation 
in the presence of continuum aqueous solution, a sensible worsening of the accuracy of the 
method was found. This behaviour is due to a small inaccuracy in the calculation of the 
forces, that has no significant effects on the geometry relaxations of the tested molecules. 
Such an error could be traced back to be related to the use of a simplified 3-points central 
difference schemes in the evaluation of the gradient of the dielectric that appears in the 
definition of the polarization charges (see, e.g., Eqs. (fTOj) and (HI])). Several alternative 
and more advanced schemes can be adopted for a finite difference calculation of such a gra- 
dient, including higher order central differences or smooth noise-robust differentiators j63|. 
By exploiting a number of points in the discretized space larger than three, these schemes 
can produce an unphysical non-vanishing gradient in the fiat regions of space close to the 
solvent-vacuum interface. Nonetheless, the more accurate description of the gradient in the 
interfacial region produces interatomic forces more accurate than the ones obtained by the 
simplest 3-points central difference method, thus resulting in total energy conservation of 
the same quality of the simulation in vacuum (compare left and right panel of Figure (IT^ ). 



VIII. CONCLUSIONS 

A revised self-consistent continuum solvation model has been presented that overcomes 
most of the limitations of previous approaches and includes, in a simplified way, additional 
non-electrostatic contributions to solvation free energies via a combined use of the concepts of 
quantum volume and quantum surface (sj for isolated fragments. The model has been imple- 
mented using a novel iterative approach that is robust, efficient, fully parallel and convenient 
to implement in electronic-structure codes, both for isolated or periodic geometries indepen- 
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Figure 14. Total and potential energies of a water tetramer along Born-Oppeneimer NVE molecular 
dynamics trajectories, in vacuum (left panel) and in a continuum acqueous solution (right panel). 
The same accuracy on total energy conservation is found for the simulation in vacuum and in 
solution, when a high-order (e.g. 5-points, blue curves) central differences (CD) scheme is exploited 
to compute in real space the gradient of the dielectric function. When the simpler 3-points CD 
scheme (green curves) is adopted, a small systematic error on the interatomic forces results in poor 
energy conservation along the MD simulation. 



dently of the charge density representation or the Bravais lattice chosen. Parametrization 
of the method to reproduce the experimental results has been performed, and the overall 
performance over an extended set of solvation free energies in water compares favorably 
with the results from reference theoretical methods of similar physical background and more 
parameter intensive. The static dielectric constant of the solvent and one parameter allow 
to fit the electrostatic energy provided by the PCM model with a mean absolute error of 
0.3 kcal/mol on a set of 240 neutral solutes, and two parameters allow to fit experimental 
solvation energies on the same set with a mean absolute error of 1.3 kcal/mol. A detailed 
analysis of these results, broken down along different classes of chemical compounds, shows 
that most molecules display even closer agreement, whereby larger errors are mostly limited 
to self- dissociating species and strong hydrogen-bond forming compounds. Last, a careful 
analysis of the effects of numerical parameters on the predictions of the method have been 
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presented, to make present results fully reproducible. 
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APPENDIX A: EFFECT OF NUMERICAL PARAMETERS 

In an effort of making the present results reproducible and to study the accuracy of the 
model proposed, a careful analysis of the effect of all other numerical and computational 
parameters on solvation energies is mandatory. In fact, only a few of these have been 
discussed and shown to have a negligible effect on the final results. This is the case of 
the finite-difference parameter A that appears in the definition of the cavitation energy, Eq. 
(160|) . which was found to have negligible effects on the computed energies [2]. For this reason, 
a value of A = 0.0001 a.u. was used throughout the simulations. Similarly, as discussed in 
Section IVj the parameters entering into the iterative algorithm to compute the polarization 
charges show no significant impact on the final results. 

The main parameter that enters into an ab-initio calculation is the size of the basis set, 
that in the plane-wave codes for which the model was developed corresponds to the kinetic 
energy cutoffs of the plane waves used to represent the wavefunctions and electronic density. 
The density cutoff, in particular, corresponds to the grid size in real space used in all the 
calculations of the polarization density (see Section (jV])) and, as such, it represents a crucial 
numerical parameter of the model. Since the electronic density is given by the square of the 
wavefunction, a cutoff on the density four times larger than the one on the wavefunction 
is usually adopted. Nonetheless, the use of ultrasoft pseudopotential, while allowing to 
lower the plane-wave cutoffs for the description of the wavefunction, requires a much higher 
ratio between the two cutoffs, with density cutoffs 8 to 10 times larger than the ones on 
wavefunctions. 
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Figure 15. Convergence of the calculations with respect to wavefunction (wfc) cutoff. Left: con- 
vergence of total energy of molecule 013 of set (piperazine) in vacuum and in solution (solvent 
parameters: p™"- = 0.0001 a.u., p'^'^^ = 0.001 a.u.). Note that the two curves are on top of each 
other on the scale of the graph. In the inset, a closer view on the convergence of the error for 
large wavefunction cutoffs is reported. Center: convergence of the absolute error on solvation free 
energy /S.G^°^'^ for the thirteen molecules of the fitting set (solvent parameters: = 0.0001 

a.u., p"^°-^ = 0.001 a.u.). Right: convergence of the mean absolute error on solvation free ener- 
gies AG*"^'*^ of the fitting set for different values of solvent parameters. Error bars represent one 
standard deviation from the average. 



Convergence of the electrostatic solvation free energy with wavefuntion cutoffs for the 
thirteen molecules of the fitting set is reported in Figure (ITSl) . Results show a reasonably 
fast convergence with respect to wavefunction cutoff: while variations on the computed 
energies are of the order of the kcal/mol even for the larger cutoffs considered, solvation 
free energies are well converged for wavefunction cutoffs of the order of 30 Ry. At lower 
cutoffs a small subset of molecules present errors on solvation energies of the order of 0.5-1.0 
kcal/mol: since the molecules on this subset are the ones containing oxygen, it seems that 
the error is mostly related to the use of a pseudopotential for oxygen that is harder than 
the ones used for the other atomic types. The fact that changing the solvent parameters 
has a negligible effect on the convergence of the results also supports the conclusion that 
the solvation calculation is independent on the choice of the wavefunction cutoffs. 

Different results are obtained for the convergence with respect to the density cutoff, as 
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Figure 16. Convergence of the calculations with respect to density cutoff. Left: convergence of 
total energy of molecule 013 of set (piperazine) in vacuum and in solution (solvent parameters: 
pmtn _ 0.0001 a.u., p'^"-^ = 0.001 a.u.). Center: convergence of the absolute error on solvation 
free energy AG*°''^ for the thirteen molecules of the fitting set (solvent parameters: p'™" = 0.0001 
a.u., p^^^ = 0.001 a.u.). Right: convergence of the mean absolute error on solvation free energies 
^Qsoifi q£ ^j^g fitting set for different values of solvent parameters. Error bars represent one standard 
deviation from the average. 

shown in Figure fll6p . Convergence is slightly slower than with respect to wavefunction cut- 
offs, with errors of the order of 0.2-0.3 kcal/mol for the normal range of this parameter (i.e. 
between 300 and 400 Ry). This residual error appear to be mostly related in the first place 
to the sharpness of the polarization charge and potentials. Moreover, the numerical details 
of the method could also be crucial, as reflected by the fact that the polarization charge 
density is not neutral. In particular, the calculation of V In e j^p^'*^^ (r)] that enters into Eq. 
f l44p is performed using a simplified approach, that involves a centered three-point finite 
difference algorithm in real space. Nonetheless, for the range of solvent parameters investi- 
gated, the error is generally less than 0.1 kcal/mol, thus well within the typical accuracies 
of conventional continuum solvation models. 

In Figure (fT7|) the cell-size dependence of the electrostatic contribution to solvation free 
energies is reported, for the thirteen molecules of the fitting set, with and without corrections 
for periodic-boundary conditions. As already mentioned, cell sizes were chosen to be equal 
to the maximum size of the molecule optimized in vacuum, plus an adjustable amount of 
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Figure 17. Convergence of the calculations with respect to cell extra size, where effective cell size is 
defined as the maximum dimension of the molecule plus cell extra size. Left: convergence of total 
energy of molecule 013 of set (piperazine) in vacuum and in solution, with no periodic boundary 
corrections (squares) and with Makov-Payne correction (circles) applied. Center: convergence of 
the absolute error on solvation free energy AG*°'''^ for the thirteen molecules of the fitting set with 
no periodic boundary corrections. Right: convergence of the absolute error on solvation free energy 
^Qsolfi £qj. thirteen molecules of the fitting set with Makov-Payne periodic boundary correction. 

10 to 40 a.u. (denoted cell extra size in Figure (fT7|) ). Since calculations are performed 
for a solvent with a high dielectric constant, screening of solute charges makes the effects 
of periodic-boundary conditions relatively small. When including corrections for periodic- 
boundary conditions, such as in the Makov-Payne scheme j39j] , no significant variations were 
found for solvation energies calculated from Eq. (169|) . while large changes of the energy 
in vacuum appear. Thus, the error is mostly related to the accuracy of the calculation in 
vacuum. Moreover, the overall error was found to be lower than 0.1 kcal/mol for most of the 
molecules in the fitting set, while larger errors up to 0.5 kcal/mol were found for the molecules 
with larger dipole moments. When a correction for periodic images is explicitly accounted 
for, convergence is quite fast even for the smallest cell sizes considered. More advanced and 
accurate periodic boundary correction schemes, such as the density-countercharge correction 
(DCC) 40|], can be easily extended to calculations with the SCCS, by explicitly considering 
the polarization charge density. 

Last, the effect on solvation free energies of the fictitious atomic density used to compute 
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Figure 18. Convergence of the calculations with respect to atomic spreads. Center: convergence 
of the absolute error on solvation free energy for the thirteen molecules of the fitting set 

(solvent parameters: p™"- = 0.0001 a.u., p^"-^ = 0.001 a.u.). Right: convergence of the mean 
absolute error on solvation free energies AC"''^ of the fitting set for different values of the solvent 
parameters. Error bars represent one standard deviation from the average. 



the electrostatic field of nuclei and core electrons has to be taken into account (see Figure 
( IT8|) ). In order to study such an effect, we assumed the nuclei to be described by Gaussians 
of fixed spread a that do not depend on the atomic type. This is a typical approximation 
in plane-wave codes, and the correct electrostatic is usually recovered in the total energy by 
including Ewald complementary-error-function electrostatic terms. Note that more advanced 
approaches can be adopted, e.g. by defining Gaussian spreads to depend on pseudopotential 
radii, or by explicitly accounting for the core electronic density. 

As explained in Section (lIVp . in order for the energy of the solvated system to be well 
defined, the fictitious ionic charge density should be entirely included in the solvent exclusion 
region. Such a region is directly related to the parameters p"^™ and p"*"^' used in the definition 
of the dielectric constant. In particular, a high value of p^""^ implies the presence of dielectric 
contributions close to the nuclei. As a result, for higher values of p™''^^ the effect of the ionic 
density becomes sensitive at lower values of a. Nonetheless, in the whole range of density 
thresholds considered we always obtain well converged results for Gaussian spreads in the 
window 0.3 a.u. < a < 1.0 a.u.. 
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